National Repository of Grey Literature 16 records found  1 - 10next  jump to record: Search took 0.01 seconds. 
Web Application for Graphical Description and Execution of Spark Tasks
Hmeľár, Jozef ; Burget, Radek (referee) ; Rychlý, Marek (advisor)
This master's thesis deals with Big data processing in distributed system Apache Spark using tools, which allow remotely entry and execution of Spark tasks through web inter- face. Author describes the environment of Spark in the first part, in the next he focuses on the Apache Livy project, which offers REST API to run Spark tasks. Contemporary solutions that allow interactive data analysis are presented. Author further describes his own application design for interactive entry and launch of Spark tasks using graph repre- sentation of them. Author further describes the web part of the application as well as the server part of the application. In next section author presents the implementation of both parts and, last but not least, the demonstration of the result achieved on a typical task. The created application provides an intuitive interface for comfortable working with the Apache Spark environment, creating custom components, and also a number of other options that are standard in today's web applications.
Computational tasks for Parallel data processing course
Horečný, Peter ; Rajnoha, Martin (referee) ; Mašek, Jan (advisor)
The goal of this thesis was to create laboratory excercises for subject „Parallel data processing“, which will introduce options and capabilities of Apache Spark technology to the students. The excercises focus on work with basic operations and data preprocessing, work with concepts and algorithms of machine learning. By following the instructions, the students will solve real world situations problems by using algorithms for linear regression, classification, clustering and frequent patterns. This will show them the real usage and advantages of Spark. As an input data, there will be databases of czech and slovak companies with a lot of information provided, which need to be prepared, filtered and sorted for next processing in the first excercise. The students will also get known with functional programming, because the are not whole programs in excercises, but just the pieces of instructions, which are not repeated in the following excercises. They will get a comprehensive overview about possibilities of Spark by getting over all the excercices.
Computational tasks for solving parallel data processing
Rexa, Denis ; Uher, Václav (referee) ; Mašek, Jan (advisor)
The goal of this diploma thesis was to create four laboratory exercises for the subject "Parallel Data Processing", where students will try on the options and capabilities of Apache Spark as a parallel computing platform. The work also includes basic setup and use of Apache Kafka technology and NoSQL Apache Cassandra database. The other two lab assignments focus on working with a Travelling Salesman Problem. The first lab was designed to demonstrate the difficulty of a task where the student will face an exponential increase in complexity. The second task consists of an optimization algorithm to solve the problem in cluster. This algorithm is subjected to performance measurements in clusters. The conclusion of the thesis contains recommendations for optimization as well as comparison of running with different number of computing devices.
Model Driven Development of Spark Tasks
Bútora, Matúš ; Burget, Radek (referee) ; Rychlý, Marek (advisor)
The aim of the master thesis is to describe Apache Spark framework , its structure and the way how Spark works . Next goal is to present topic of Model- Driven Development and Model-Drive Architecture . Define their advantages , disadvantages and way of usage . However , the main part of this text is devoted to design a model for creating tasks in Apache Spark framework . Text desribes application , that allows user to create graph based on proposed modeling language . Final application allows user to generate source code from created model.
Network Traces Analysis Using Apache Spark
Béder, Michal ; Veselý, Vladimír (referee) ; Ryšavý, Ondřej (advisor)
The aim of this thesis is to show how to design and implement an application for network traces analysis using Apache Spark distributed system. Implementation can be divided into three parts - loading data from a distributed HDFS storage, supported network protocols analysis and distributed data processing. As a data visualization tool is used web-based notebook Apache Zeppelin. The resulting application is able to analyze individual packets as well as the entire flows. It supports JSON and pcap as input data formats. The goal of the application is to allow Big Data processing. The greatest impact on its performance has the input data format and allocation of the available cores.
Model Driven Development of Spark Tasks by Means of Eclipse Acceleo
Šalgovič, Marek ; Bartík, Vladimír (referee) ; Rychlý, Marek (advisor)
Táto diplomová práca sa zaoberá modelom riadeným vývojom Big Data úloh v prostredí Apache Spark. Na začiatok je čitateľovi predstavený framework Apache Spark a potrebné detaily. Ďalej sa priblíži problematika modelom riadeného vývoja a popíšu sa jeho výhody a nevýhody. V druhej časti je popísaný navrhnutý meta-model pre modelovanie úloh Sparku. Detailne sú popísané vlastnosti navrhnutého profilového diagramu, ktorý rozširuje diagram tried. Následne je implementovaný generátor kódu, ktorého vstup sú modely vyhovujúce navrhnutému meta-modelu. Práca taktiež obsahuje príklady modelov a ich vyhodnotenie. 
Porting of Plaso Extractors to the Apache Spark Platform
Baláž, Miroslav ; Burget, Radek (referee) ; Rychlý, Marek (advisor)
The theoretical part discusses the functioning and architecture of the Plaso tool. The thesis further explores current tools that implement distributed computational models. It describes their architecture, data abstracts and how they work. The thesis also describes current tools that implement distributed storage. The work includes the creation of the Plasospark tool, which converts the computation of the Plaso tool to the Spark platform and uses the Hadoop HDFS storage for forensic data.
Model Driven Development of Spark Tasks by Means of Eclipse Acceleo
Šalgovič, Marek ; Bartík, Vladimír (referee) ; Rychlý, Marek (advisor)
Táto diplomová práca sa zaoberá modelom riadeným vývojom Big Data úloh v prostredí Apache Spark. Na začiatok je čitateľovi predstavený framework Apache Spark a potrebné detaily. Ďalej sa priblíži problematika modelom riadeného vývoja a popíšu sa jeho výhody a nevýhody. V druhej časti je popísaný navrhnutý meta-model pre modelovanie úloh Sparku. Detailne sú popísané vlastnosti navrhnutého profilového diagramu, ktorý rozširuje diagram tried. Následne je implementovaný generátor kódu, ktorého vstup sú modely vyhovujúce navrhnutému meta-modelu. Práca taktiež obsahuje príklady modelov a ich vyhodnotenie. 
Data Lineage Analysis of Frameworks with Complex Interaction Patterns
Hýbl, Oskar ; Parízek, Pavel (advisor) ; Hnětynka, Petr (referee)
Manta Flow is a tool for analyzing data flow in enterprise environment. It features Java scanner, a module using static analysis to determine the flows through Java applications. To analyze an application using some framework, the scanner requires a dedicated plugin. Although Java scanner provides plugins for several frameworks, to be usable for real applications, it is essential that the scanner supports as many frameworks as possible, which requires implementation of new plugins. Application using Apache Spark, a framework for cluster computing, are increasingly popular. Therefore we designed and implemented Java scanner plugin that allows the scanner to analyze Spark applications. As Spark focuses on data processing, this presented several challenges that were not encountered in other frameworks. In particular it was necessary to resolve the data schema in various scenarios and track the schema changes throughout any operations invoked on the data. Of the multiple APIs Spark provides for data processing, we focused on Spark SQL module, notably on Dataset, omitting the legacy RDD. We also implemented support for data access, covering JDBC and chosen file formats. The implementation has been thoroughly tested and is proven to work correctly as a part of Manta Flow, which features the plugin in...
Model Driven Development of Spark Tasks
Bútora, Matúš ; Burget, Radek (referee) ; Rychlý, Marek (advisor)
The aim of the master thesis is to describe Apache Spark framework , its structure and the way how Spark works . Next goal is to present topic of Model- Driven Development and Model-Drive Architecture . Define their advantages , disadvantages and way of usage . However , the main part of this text is devoted to design a model for creating tasks in Apache Spark framework . Text desribes application , that allows user to create graph based on proposed modeling language . Final application allows user to generate source code from created model.

National Repository of Grey Literature : 16 records found   1 - 10next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.